enable Pipeline to get device from model #30534

faaany · 2024-04-29T08:19:46Z

What does this PR do?

import torch 
from transformers import AutoModelForCausalLM
from transformers import AutoTokenizer, pipeline

model_id = "meta-llama/Llama-2-7b-chat-hf"
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16).to("cuda")
print(model.device)
tokenizer = AutoTokenizer.from_pretrained(model_id)
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)
print(pipe.model.device)
results = pipe("He's a dreadful magician and")

Currently, the code above will give an output of

cuda:0
cpu

But this is not OK: when users have moved the model to CUDA, Pipeline should not move the model back to CPU without showing any message. This PR makes it possible to let the model stay on its original device. Below is the results after this PR:

cuda:0
cuda:0

@Narsil and @muellerzr

faaany · 2024-04-29T08:23:02Z

@yao-matrix

amyeroberts

Thanks for adding this!

Could you add a test?

muellerzr · 2024-04-29T13:39:39Z

@faaany are we sure that model.device is a thing across all these frameworks?

At most I see ModuleUtilsMixin has device which is PyTorch specific (it gets added to AutoModel, but I'd like to verify the locations of TF and Flax backends having these capabilities to grab the model device. Otherwise we don't really want just None here IMO

faaany · 2024-04-30T10:30:35Z

Thanks for adding this!

Could you add a test?

sure, in which test file should I put this test?

faaany · 2024-04-30T10:37:32Z

@faaany are we sure that model.device is a thing across all these frameworks?

At most I see ModuleUtilsMixin has device which is PyTorch specific (it gets added to AutoModel, but I'd like to verify the locations of TF and Flax backends having these capabilities to grab the model device. Otherwise we don't really want just None here IMO

Good point! Yes, I know that Flax model doesn't have "device". How about moving it inside if is_torch_available() and self.framework == "pt": ? I have updated my code.

Furthermore, I removed the self.device is not None check, because it will never be None. And I also added the logic that model shouldn't be moved, if the model is already on device.

amyeroberts

Thanks for updating and handling the torch case!

Only request is to add a test.

@muellerzr could you give a quick review as you correctly spotted and highlighted the torch vs. other frameworks case?

faaany · 2024-05-11T03:34:04Z

Hi @amyeroberts, sorry for the late response. We had a long holiday here in China. Unit tests are added. Let me explain more about in detail:

There are 3 possibilities for model.device:
a1. user passes device_map to from_pretrained
a2. user doesn't pass device_map to from_pretrained
a3. user manually moves the model to a certain device with to(device) after model is loaded with from_pretrained

There are 2 possibilities for pipeline.device:
b1. user passes device to pipeline
b2. user doesn't pass device to pipeline

Sincea2&b2 is trivial, my unit tests cover the cases a1&b1, a1&b2, a3&b1 and a3&b2. Pls have a review, thx!

amyeroberts

Looks great - thanks for adding the tests and the explanation!

cc @muellerzr For a final double check to make sure this makes sense with accelerate

tests/pipelines/test_pipelines_common.py

muellerzr

Much better, thanks! Agreed post Amy's nit :)

Co-authored-by: amyeroberts <[email protected]>

faaany · 2024-05-13T13:48:58Z

Thanks for the review! @amyeroberts @muellerzr

amyeroberts reviewed Apr 29, 2024

View reviewed changes

faaany added 3 commits April 29, 2024 13:14

check model.device

ae6b3be

fix

b3113ba

style fix

03a87fb

amyeroberts approved these changes Apr 30, 2024

View reviewed changes

faaany and others added 5 commits April 30, 2024 15:32

move model device

be79f66

remove print

dea4dbb

add comment

385e8bd

fix

c1f9440

Merge branch 'huggingface:main' into device

5722822

faaany added 3 commits May 11, 2024 08:29

add unit test

8347472

optimize

868c1ee

change test names and add more cases

f80667d

amyeroberts approved these changes May 13, 2024

View reviewed changes

tests/pipelines/test_pipelines_common.py Outdated Show resolved Hide resolved

muellerzr approved these changes May 13, 2024

View reviewed changes

Update tests/pipelines/test_pipelines_common.py

ff340cb

Co-authored-by: amyeroberts <[email protected]>

amyeroberts merged commit 69d9bca into huggingface:main May 13, 2024
20 checks passed

This was referenced May 23, 2024

[Inference] Fix inference latency issue when weights/neff are separated huggingface/optimum-neuron#584

Merged

Fix pipelines huggingface/optimum-neuron#608

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

enable Pipeline to get device from model #30534

enable Pipeline to get device from model #30534

faaany commented Apr 29, 2024 •

edited

Loading

faaany commented Apr 29, 2024

amyeroberts left a comment

muellerzr commented Apr 29, 2024

faaany commented Apr 30, 2024

faaany commented Apr 30, 2024

amyeroberts left a comment

faaany commented May 11, 2024 •

edited

Loading

amyeroberts left a comment

muellerzr left a comment

faaany commented May 13, 2024

enable Pipeline to get device from model #30534

enable Pipeline to get device from model #30534

Conversation

faaany commented Apr 29, 2024 • edited Loading

What does this PR do?

faaany commented Apr 29, 2024

amyeroberts left a comment

Choose a reason for hiding this comment

muellerzr commented Apr 29, 2024

faaany commented Apr 30, 2024

faaany commented Apr 30, 2024

amyeroberts left a comment

Choose a reason for hiding this comment

faaany commented May 11, 2024 • edited Loading

amyeroberts left a comment

Choose a reason for hiding this comment

muellerzr left a comment

Choose a reason for hiding this comment

faaany commented May 13, 2024

faaany commented Apr 29, 2024 •

edited

Loading

faaany commented May 11, 2024 •

edited

Loading